Install and load awst

if (!requireNamespace("BiocManager", quietly = TRUE))
    install.packages("BiocManager")

BiocManager::install("drisso/awst")

Data import and cleaning

The collection of the SEQC datasets is available throught the seqc Bioconductor package. It can be installed with the following.

BiocManager::install("seqc")

We next build the data matrix from the “ILM_aceview” experiments. We remove duplicate gene symbols, ERCC spike-ins, and genes with no ENTREZ ID.

Distribution of samples per sites
AGR BGI CNL COH MAY NVS Sum
A 4 5 5 4 5 4 27
B 4 5 5 4 5 4 27
C 4 5 5 4 5 4 27
D 4 5 5 4 5 4 27
Sum 16 20 20 16 20 16 108

Figure 2 and Supplementary figure 1

Figure 2a and Supplementary figure 1a: clustering on RSEM data after awst

Figure 2b and Supplementary figure 1b: clustering on CPM data after awst

Figure 2c: clustering on CPM data (std values)

Figure 2d and Supplementary figure 1d: clustering on CPM data (std values; top 100 genes)

Figure 2e: clustering on TPM data (std values)

Figure 2f and Supplementary figure 1f: clustering on TPM data (std values; top 100 genes)

Figure 2g and Supplementary figure 1g: clustering on TPM data after awst

Figure 2h and Supplementary figure 1h: clustering on TPM data after Hart transformation

Figure 3 and Supplementary figure 2

3 perturbed samples

Figure 3a and Supplementary figure 2a: clustering on CPM data (std values; top 2500 genes)

Figure 3e and Supplementary figure 2e: clustering on RSEM data after awst

Two-third perturbed samples

Figure 3c and Supplementary figure 2c: clustering on CPM data (std values; top 2500 genes)

Figure 3g: clustering on RSEM data after awst

All perturbed samples

Figure 3d and Supplementary figure 2d: clustering on CPM data (std values; top 2500 genes)

Figure 3h and Supplementary figure 2h: clustering on RSEM data after awst

Session info

## R version 3.6.1 (2019-07-05)
## Platform: x86_64-apple-darwin15.6.0 (64-bit)
## Running under: macOS Sierra 10.12.6
## 
## Matrix products: default
## BLAS:   /Library/Frameworks/R.framework/Versions/3.6/Resources/lib/libRblas.0.dylib
## LAPACK: /Library/Frameworks/R.framework/Versions/3.6/Resources/lib/libRlapack.dylib
## 
## locale:
## [1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8
## 
## attached base packages:
## [1] stats     graphics  grDevices utils     datasets  methods   base     
## 
## other attached packages:
## [1] seqc_1.18.0       awst_0.0.3        dendextend_1.12.0 cluster_2.1.0    
## [5] knitr_1.25       
## 
## loaded via a namespace (and not attached):
##  [1] Rcpp_1.0.2          highr_0.8           pillar_1.4.2       
##  [4] compiler_3.6.1      viridis_0.5.1       tools_3.6.1        
##  [7] digest_0.6.21       evaluate_0.14       tibble_2.1.3       
## [10] gtable_0.3.0        viridisLite_0.3.0   pkgconfig_2.0.3    
## [13] rlang_0.4.0         parallel_3.6.1      yaml_2.2.0         
## [16] xfun_0.10           gridExtra_2.3       stringr_1.4.0      
## [19] dplyr_0.8.3         grid_3.6.1          tidyselect_0.2.5   
## [22] Biobase_2.44.0      glue_1.3.1          R6_2.4.0           
## [25] rmarkdown_1.16      ggplot2_3.2.1       purrr_0.3.2        
## [28] magrittr_1.5        BiocGenerics_0.30.0 scales_1.0.0       
## [31] htmltools_0.4.0     assertthat_0.2.1    colorspace_1.4-1   
## [34] stringi_1.4.3       lazyeval_0.2.2      munsell_0.5.0      
## [37] crayon_1.3.4